Adaptive Strategies and Regret Minimization in Arbitrarily Varying Markov Environments

نویسندگان

  • Shie Mannor
  • Nahum Shimkin
چکیده

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments

The nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, and Schapire in 1995, is a game of repeatedly choosing one decision from a set of decisions (“experts”), under partial observation: In each round t , only the cost of the decision played is observable. A regret minimization algorithm plays this game while achieving sublinear regret relative to each decisi...

متن کامل

Adaptive Regret Minimization in Bounded-Memory Games

Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret minimization has been extensively studied in repeated games, we study regret minimization for a richer class of games called bounded memory games. In each round of ...

متن کامل

Inference-based Decision Making in Games

Background: Reinforcement learning in complex games has traditionally been the domain of valueor policy iteration algorithms, resulting from their effectiveness in planning in Markov decision processes, before algorithms based on regret minimization guarantees such as upper confidence bounds applied to trees (UCT) and counterfactual regret minimization were developed and proved to be very succe...

متن کامل

A Regret Minimization Approach in Product Portfolio Management with respect to Customers’ Price-sensitivity

In an uncertain and competitive environment, product portfolio management (PPM) becomes more challenging for manufacturers to decide what to make and establish the most beneficial product portfolio. In this paper, a novel approach in PPM is proposed in which the environment uncertainty, competitors’ behavior and customer’s satisfaction are simultaneously considered as the most important criteri...

متن کامل

A Robust Adaptive Observer-Based Time Varying Fault Estimation

This paper presents a new observer design methodology for a time varying actuator fault estimation. A new linear matrix inequality (LMI) design algorithm is developed to tackle the limitations (e.g. equality constraint and robustness problems) of the well known so called fast adaptive fault estimation observer (FAFE). The FAFE is capable of estimating a wide range of time-varying actuator fault...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001